38 research outputs found
Topological network alignment uncovers biological function and phylogeny
Sequence comparison and alignment has had an enormous impact on our
understanding of evolution, biology, and disease. Comparison and alignment of
biological networks will likely have a similar impact. Existing network
alignments use information external to the networks, such as sequence, because
no good algorithm for purely topological alignment has yet been devised. In
this paper, we present a novel algorithm based solely on network topology, that
can be used to align any two networks. We apply it to biological networks to
produce by far the most complete topological alignments of biological networks
to date. We demonstrate that both species phylogeny and detailed biological
function of individual proteins can be extracted from our alignments.
Topology-based alignments have the potential to provide a completely new,
independent source of phylogenetic information. Our alignment of the
protein-protein interaction networks of two very different species--yeast and
human--indicate that even distant species share a surprising amount of network
topology with each other, suggesting broad similarities in internal cellular
wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
Probabilistic Random Walk Models for Comparative Network Analysis
Graph-based systems and data analysis methods have become critical tools in many
fields as they can provide an intuitive way of representing and analyzing interactions between
variables. Due to the advances in measurement techniques, a massive amount of
labeled data that can be represented as nodes on a graph (or network) have been archived
in databases. Additionally, novel data without label information have been gradually generated
and archived. Labeling and identifying characteristics of novel data is an important
first step in utilizing the valuable data in an effective and meaningful way. Comparative
network analysis is an effective computational means to identify and predict the properties
of the unlabeled data by comparing the similarities and differences between well-studied
and less-studied networks. Comparative network analysis aims to identify the matching
nodes and conserved subnetworks across multiple networks to enable a prediction of the
properties of the nodes in the less-studied networks based on the properties of the matching
nodes in the well-studied networks (i.e., transferring knowledge between networks).
One of the fundamental and important questions in comparative network analysis is
how to accurately estimate node-to-node correspondence as it can be a critical clue in
analyzing the similarities and differences between networks. Node correspondence is a
comprehensive similarity that integrates various types of similarity measurements in a
balanced manner. However, there are several challenges in accurately estimating the node
correspondence for large-scale networks. First, the scale of the networks is a critical issue.
As networks generally include a large number of nodes, we have to examine an extremely
large space and it can pose a computational challenge due to the combinatorial nature of
the problem. Furthermore, although there are matching nodes and conserved subnetworks
in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity.
In this dissertation, novel probabilistic random walk models are proposed to accurately
estimate node-to-node correspondence between networks. First, we propose a context-sensitive
random walk (CSRW) model. In the CSRW model, the random walker analyzes
the context of the current position of the random walker and it can switch the random
movement to either a simultaneous walk on both networks or an individual walk on one
of the networks. The context-sensitive nature of the random walker enables the method
to effectively integrate different types of similarities by dealing with structural variations.
Second, we propose the CUFID (Comparative network analysis Using the steady-state
network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct
an integrated network by inserting pseudo edges between potential matching nodes in
different networks. Then, we design the random walk protocol to transit more frequently
between potential matching nodes as their node similarity increases and they have more
matching neighboring nodes. We apply the proposed random walk models to comparative
network analysis problems: global network alignment and network querying. Through
extensive performance evaluations, we demonstrate that the proposed random walk models
can accurately estimate node correspondence and these can lead to improved and reliable
network comparison results
Simultaneous Optimization of Both Node and Edge Conservation in Network Alignment via WAVE
Network alignment can be used to transfer functional knowledge between
conserved regions of different networks. Typically, existing methods use a node
cost function (NCF) to compute similarity between nodes in different networks
and an alignment strategy (AS) to find high-scoring alignments with respect to
the total NCF over all aligned nodes (or node conservation). But, they then
evaluate quality of their alignments via some other measure that is different
than the node conservation measure used to guide the alignment construction
process. Typically, one measures the amount of conserved edges, but only after
alignments are produced. Hence, a recent attempt aimed to directly maximize the
amount of conserved edges while constructing alignments, which improved
alignment accuracy. Here, we aim to directly maximize both node and edge
conservation during alignment construction to further improve alignment
accuracy. For this, we design a novel measure of edge conservation that (unlike
existing measures that treat each conserved edge the same) weighs each
conserved edge so that edges with highly NCF-similar end nodes are favored. As
a result, we introduce a novel AS, Weighted Alignment VotEr (WAVE), which can
optimize any measures of node and edge conservation, and which can be used with
any NCF or combination of multiple NCFs. Using WAVE on top of established
state-of-the-art NCFs leads to superior alignments compared to the existing
methods that optimize only node conservation or only edge conservation or that
treat each conserved edge the same. And while we evaluate WAVE in the
computational biology domain, it is easily applicable in any domain.Comment: 12 pages, 4 figure
PROPER: global protein interaction network alignment through percolation matching
Background The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. Results In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. Conclusions We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch
Mining host-pathogen protein interactions to characterize Burkholderia mallei infectivity mechanisms.
Burkholderia pathogenicity relies on protein virulence factors to control and promote bacterial internalization, survival, and replication within eukaryotic host cells. We recently used yeast two-hybrid (Y2H) screening to identify a small set of novel Burkholderia proteins that were shown to attenuate disease progression in an aerosol infection animal model using the virulent Burkholderia mallei ATCC 23344 strain. Here, we performed an extended analysis of primarily nine B. mallei virulence factors and their interactions with human proteins to map out how the bacteria can influence and alter host processes and pathways. Specifically, we employed topological analyses to assess the connectivity patterns of targeted host proteins, identify modules of pathogen-interacting host proteins linked to processes promoting infectivity, and evaluate the effect of crosstalk among the identified host protein modules. Overall, our analysis showed that the targeted host proteins generally had a large number of interacting partners and interacted with other host proteins that were also targeted by B. mallei proteins. We also introduced a novel Host-Pathogen Interaction Alignment (HPIA) algorithm and used it to explore similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica. We inferred putative roles of B. mallei proteins based on the roles of their aligned Y. pestis and S. enterica partners and showed that up to 73% of the predicted roles matched existing annotations. A key insight into Burkholderia pathogenicity derived from these analyses of Y2H host-pathogen interactions is the identification of eukaryotic-specific targeted cellular mechanisms, including the ubiquitination degradation system and the use of the focal adhesion pathway as a fulcrum for transmitting mechanical forces and regulatory signals. This provides the mechanisms to modulate and adapt the host-cell environment for the successful establishment of host infections and intracellular spread
List of proteins evaluated in high-throughput yeast two-hybrid assay and number of host protein-protein interactions.
<p>List of proteins evaluated in high-throughput yeast two-hybrid assay and number of host protein-protein interactions.</p
Host pathways targeted by <i>Coxiella</i>.
<p><i>C</i>. <i>burnetii</i>-interacting host proteins are present in interconnected Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with the potential to affect multiple cellular processes of the host. The pathways are grouped into five major categories: RNA processing, protein processing, degradation pathways, signaling (including signaling events related to the immune response), and metabolism. The size of a star indicates the number of targeted host proteins in each pathway. ECM, extracellular matrix; ER, endoplasmic reticulum; ErbB, erythroblastic leukemia viral oncogene; ESCRT, endosomal sorting complexes required for transport; MAPK, mitogen-activated protein kinase; NOD, nucleotide-binding oligomerization domain; PIK3, phosphatidylinositol-3-kinases; TCA, tricarboxylic acid; TGF, transforming growth factor.</p
Mechanisms of action of <i>Coxiella burnetii</i> effectors inferred from host-pathogen protein interactions
<div><p><i>Coxiella burnetii</i> is an obligate Gram-negative intracellular pathogen and the etiological agent of Q fever. Successful infection requires a functional Type IV secretion system, which translocates more than 100 effector proteins into the host cytosol to establish the infection, restructure the intracellular host environment, and create a parasitophorous vacuole where the replicating bacteria reside. We used yeast two-hybrid (Y2H) screening of 33 selected <i>C</i>. <i>burnetii</i> effectors against whole genome human and murine proteome libraries to generate a map of potential host-pathogen protein-protein interactions (PPIs). We detected 273 unique interactions between 20 pathogen and 247 human proteins, and 157 between 17 pathogen and 137 murine proteins. We used orthology to combine the data and create a single host-pathogen interaction network containing 415 unique interactions between 25 <i>C</i>. <i>burnetii</i> and 363 human proteins. We further performed complementary pairwise Y2H testing of 43 out of 91 <i>C</i>. <i>burnetii-</i>human interactions involving five pathogen proteins. We used the combined data to <i>1</i>) perform enrichment analyses of target host cellular processes and pathways, <i>2</i>) examine effectors with known infection phenotypes, and <i>3</i>) infer potential mechanisms of action for four effectors with uncharacterized functions. The host-pathogen interaction profiles supported known <i>Coxiella</i> phenotypes, such as adapting cell morphology through cytoskeletal re-arrangements, protein processing and trafficking, organelle generation, cholesterol processing, innate immune modulation, and interactions with the ubiquitin and proteasome pathways. The generated dataset of PPIs—the largest collection of unbiased <i>Coxiella</i> host-pathogen interactions to date—represents a rich source of information with respect to secreted pathogen effector proteins and their interactions with human host proteins.</p></div
Topological properties of human proteins interacting with <i>B. mallei</i>.
<p>SD: standard deviation.</p><p>We evaluated the following properties of the host proteins that interacted with B. mallei proteins based on the human protein-protein interaction (PPI) network [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004088#pcbi.1004088.ref025" target="_blank">25</a>]: the number of these host proteins in the human PPI network (<i>N<sub>p</sub></i>); the average number of interacting partners (in the human PPI network) of each host protein (<i>D</i>); the clustering coefficient, i.e., the number of interactions among the nearest neighbors (<i>C</i>); the average shortest path between any two proteins in the set (<i>SP</i>); the average number of interacting partners in the human PPI network where both partners interact with <i>B. mallei</i> proteins (<i>D<sub>i</sub></i>); and the number of host proteins in the largest connected component (</p><p></p><p></p><p></p><p><mi>N</mi><mi>p</mi></p><p><mi>L</mi><mi>C</mi><mi>C</mi></p><p></p><p></p><p></p><p></p>). The top three rows show the results for the host proteins present in the PPI that interacted with the nine known virulence factors, whereas the three lower rows correspond to host proteins that interacted with all 21 tested <i>B. mallei</i> proteins from the yeast two-hybrid screening (known and putative virulence factors). The results for the randomly selected (498 or 619) human proteins from the entire human PPI network (All PPIs) were generated through 10<sup>3</sup> random repetitions to create averages and standard deviations. The indicated <i>p</i>-values correspond to the probability of the observed properties being different from the randomly selected set from all PPIs.<p></p